--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      name:  <unnamed>
       log:  /Users/talgross/Dropbox/tmp/noaa-data-in-china/clean-noaa-data.log
  log type:  text
 opened on:  21 Apr 2014, 12:18:12

. 
. /* --------------------------------------
> 
> AUTHOR: Tal Gross
> 
> PURPOSE: Clean the CSV files I got from NOAA
> for two cities in China.
> 
> DATE CREATED: April 19, 2014
> 
> NOTES:
> 
> --------------------------------------- */
. 
. clear all

. estimates clear

. set mem 500m
(512000k)

. 
. ************************************************************
. **   Bring in CSV files
. ************************************************************
. 
. insheet using 324215.csv , comma names
(21 vars, 1420 obs)

. d, f

Contains data
  obs:         1,420                          
 vars:            21                          
 size:        92,300 (99.9% of memory free)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
station         str17  %17s                   STATION
station_name    str10  %10s                   STATION_NAME
elevation       byte   %8.0g                  ELEVATION
latitude        float  %9.0g                  LATITUDE
longitude       float  %9.0g                  LONGITUDE
date            long   %12.0g                 DATE
prcp            int    %8.0g                  PRCP
measurementflag str1   %9s                    Measurement Flag
qualityflag     str1   %9s                    Quality Flag
sourceflag      str1   %9s                    Source Flag
timeofobservation
                int    %8.0g                  Time of Observation
tmax            int    %8.0g                  TMAX
v13             str1   %9s                    Measurement Flag
v14             str1   %9s                    Quality Flag
v15             str1   %9s                    Source Flag
v16             int    %8.0g                  Time of Observation
tmin            int    %8.0g                  TMIN
v18             str1   %9s                    Measurement Flag
v19             str1   %9s                    Quality Flag
v20             str1   %9s                    Source Flag
v21             int    %8.0g                  Time of Observation
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:  
     Note:  dataset has changed since last saved

. 
. tempfile t324215 

. save `t324215'
file /var/folders/4p/3yn06t_10rnfdzjgk9k4kjyw0000gn/T//S_01067.000009 saved

. 
. insheet using 324335.csv , comma names clear
(21 vars, 1533 obs)

. 
. append using `t324215'

. 
. 
. ************************************************************
. **   Clean up precipitation
. ************************************************************
. 
. sum prcp

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        prcp |      2953   -1366.296    3472.917      -9999       1895

. rename prcp precipitation

. replace precipitation = . if precipitation == -9999
(411 real changes made, 411 to missing)

. 
. ************************************************************
. **   Clean up temperature 
. ************************************************************
. 
. ** The temperatures are in celsius, in tenths of a degree
. foreach var in max min {
  2.         sum t`var'
  3.         replace t`var' = . if t`var' == -9999
  4.         replace t`var' = t`var' / 10
  5.         rename t`var' t`var'_cel
  6.         gen t`var'_fah = (9 * t`var'_cel) / 5 + 32
  7. }

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        tmax |      2953   -612.8002    2743.172      -9999        406
(232 real changes made, 232 to missing)
tmax was int now float
(2716 real changes made)
(232 missing values generated)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        tmin |      2953   -1122.367    3309.599      -9999        318
(360 real changes made, 360 to missing)
tmin was int now float
(2585 real changes made)
(360 missing values generated)

. 
. codebook tmax_cel tmin_cel precipitation

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
tmax_cel                                                                                                                                                                                            TMAX
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (float)

                 range:  [-8.5,40.6]                  units:  .1
         unique values:  432                      missing .:  232/2953

                  mean:   18.7493
              std. dev:   10.7927

           percentiles:        10%       25%       50%       75%       90%
                               3.8       9.4        20      27.8      32.4

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
tmin_cel                                                                                                                                                                                            TMIN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (float)

                 range:  [-16.7,31.8]                 units:  .1
         unique values:  428                      missing .:  360/2953

                  mean:   11.0023
              std. dev:   10.8112

           percentiles:        10%       25%       50%       75%       90%
                              -3.9       2.2      11.5      20.5      24.8

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
precipitation                                                                                                                                                                                       PRCP
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (int)

                 range:  [0,1895]                     units:  1
         unique values:  243                      missing .:  411/2953

                  mean:   29.4717
              std. dev:    95.308

           percentiles:        10%       25%       50%       75%       90%
                                 0         0         0         8        86

. 
. ************************************************************
. **   Clean up date
. ************************************************************
. 
. sum date

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        date |      2953    2.01e+07    12125.62   2.01e+07   2.01e+07

. list date in 1/10

     +----------+
     |     date |
     |----------|
  1. | 20100101 |
  2. | 20100102 |
  3. | 20100103 |
  4. | 20100104 |
  5. | 20100105 |
     |----------|
  6. | 20100106 |
  7. | 20100107 |
  8. | 20100108 |
  9. | 20100109 |
 10. | 20100110 |
     +----------+

. gen year = floor(date/1e4)

. gen month = floor((date - year * 1e4 ) / 1e2)

. gen day = date - 1e4 * year - 1e2 * month

. list date year month day in 1/10

     +-------------------------------+
     |     date   year   month   day |
     |-------------------------------|
  1. | 20100101   2010       1     1 |
  2. | 20100102   2010       1     2 |
  3. | 20100103   2010       1     3 |
  4. | 20100104   2010       1     4 |
  5. | 20100105   2010       1     5 |
     |-------------------------------|
  6. | 20100106   2010       1     6 |
  7. | 20100107   2010       1     7 |
  8. | 20100108   2010       1     8 |
  9. | 20100109   2010       1     9 |
 10. | 20100110   2010       1    10 |
     +-------------------------------+

. rename date date_orig

. gen date = mdy( month , day , year)

. format date %td

. 
. codebook date

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
date                                                                                                                                                                                         (unlabeled)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  numeric daily date (float)

                 range:  [18263,19827]                units:  1
       or equivalently:  [01jan2010,14apr2014]        units:  days
         unique values:  1550                     missing .:  0/2953

                  mean:   19010.3 = 18jan2012 (+ 7 hours)
              std. dev:   440.695

           percentiles:        10%       25%       50%       75%       90%
                             18410     18632     19001     19372     19627
                         28may2010 05jan2011 09jan2012 14jan2013 26sep2013

. 
. ************************************************************
. **   Sanity checks
. ************************************************************
. 
. ** Just describe weather by month
. table month , c(mean tmin_fah mean tmax_fah)

------------------------------------------
    month | mean(tmin_fah)  mean(tmax_fah)
----------+-------------------------------
        1 |       26.71769        40.85184
        2 |       30.48701        44.67331
        3 |       38.66523        55.62159
        4 |       49.74251         67.8914
        5 |       62.05581        79.64701
        6 |       69.88745        83.29443
        7 |       77.60636        91.67776
        8 |       76.36784        89.18387
        9 |       66.78278          80.609
       10 |       54.97914        70.01714
       11 |       42.55941        57.61436
       12 |       30.60265        44.19558
------------------------------------------

. 
. ************************************************************
. **   Polish off
. ************************************************************
. 
. compress
year was float now int
month was float now byte
day was float now byte
date was float now int

. keep date station_name precipitation tmax_* tmin_* 

. sort station_name date

. 
. save cleaned-shanghai-beijing-noaa.dta , replace
(note: file cleaned-shanghai-beijing-noaa.dta not found)
file cleaned-shanghai-beijing-noaa.dta saved

. 
. 
. log close
      name:  <unnamed>
       log:  /Users/talgross/Dropbox/tmp/noaa-data-in-china/clean-noaa-data.log
  log type:  text
 closed on:  21 Apr 2014, 12:18:12
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
